bodega score
OpenFact at CheckThat! 2024: Combining Multiple Attack Methods for Effective Adversarial Text Generation
Lewoniewski, Włodzimierz, Stolarski, Piotr, Stróżyna, Milena, Lewańska, Elzbieta, Wojewoda, Aleksandra, Księżniak, Ewelina, Sawiński, Marcin
This paper presents the experiments and results for the CheckThat! Lab at CLEF 2024 Task 6: Robustness of Credibility Assessment with Adversarial Examples (InCrediblAE). The primary objective of this task was to generate adversarial examples in five problem domains in order to evaluate the robustness of widely used text classification methods (fine-tuned BERT, BiLSTM, and RoBERTa) when applied to credibility assessment issues. This study explores the application of ensemble learning to enhance adversarial attacks on natural language processing (NLP) models. We systematically tested and refined several adversarial attack methods, including BERT-Attack, Genetic algorithms, TextFooler, and CLARE, on five datasets across various misinformation tasks. By developing modified versions of BERT-Attack and hybrid methods, we achieved significant improvements in attack effectiveness. Our results demonstrate the potential of modification and combining multiple methods to create more sophisticated and effective adversarial attack strategies, contributing to the development of more robust and secure systems.
- Europe > Poland > Greater Poland Province > Poznań (0.05)
- Europe > France > Auvergne-Rhône-Alpes > Isère > Grenoble (0.04)
- Europe > Switzerland (0.04)
- Europe > Italy > Tuscany > Florence (0.04)
- Information Technology > Security & Privacy (1.00)
- Government > Military (1.00)
Verifying the Robustness of Automatic Credibility Assessment
Przybyła, Piotr, Shvets, Alexander, Saggion, Horacio
Text classification methods have been widely investigated as a way to detect content of low credibility: fake news, social media bots, propaganda, etc. Quite accurate models (likely based on deep neural networks) help in moderating public electronic platforms and often cause content creators to face rejection of their submissions or removal of already published texts. Having the incentive to evade further detection, content creators try to come up with a slightly modified version of the text (known as an attack with an adversarial example) that exploit the weaknesses of classifiers and result in a different output. Here we systematically test the robustness of popular text classifiers against available attacking techniques and discover that, indeed, in some cases insignificant changes in input text can mislead the models. We also introduce BODEGA: a benchmark for testing both victim models and attack methods on four misinformation detection tasks in an evaluation framework designed to simulate real use-cases of content moderation. Finally, we manually analyse a subset adversarial examples and check what kinds of modifications are used in successful attacks. The BODEGA code and data is openly shared in hope of enhancing the comparability and replicability of further research in this area
- Europe > France (0.14)
- North America > United States > Minnesota > Hennepin County > Minneapolis (0.14)
- Asia > Middle East > Syria > Aleppo Governorate > Aleppo (0.05)
- (17 more...)
- Media > News (1.00)
- Information Technology > Security & Privacy (1.00)
- Government > Regional Government > North America Government > United States Government (1.00)